Optimizing Latency and Throughput Trade-offs in a Stream Processing System
نویسندگان
چکیده
The value of stream processing systems stems largely from the timeliness of the results these systems provide. Early stream processors followed the record-at-a-time approach, servicing each data point as soon as it arrives at the system. While these systems provide good latency, their behaviors become less desirable when applications require high throughput, fault tolerance, or usage of stateful or blocking operators. More recently, systems are developed to follow the micro-batch approach, where many records are processed together as small batches, providing the missing features seemingly at the cost of latency. Given the situation, we look into a micro-batch streaming system called Spark Streaming, and investigate how well the micro-batch architecture can handle latency-sensitive workloads. We instrumented Spark Streaming to understand where time goes and where development effort should be focused on. In this paper we make three contributions. First, we provide an analysis of the performance of Spark Streaming, showing the average time breakdown within the system. Second, we identify the performance and scalability bottlenecks of Spark Streaming and pinpoint the underlying deficiencies of the system. Last, we propose and evaluate several optimizations to reduce the system overhead and achieve lower latency while maintaining throughput.
منابع مشابه
Stream ciphers and the eSTREAM project
Stream ciphers are an important class of symmetric cryptographic algorithms. The eSTREAM project contributed significantly to the recent increase of activity in this field. In this paper, we present a survey of the eSTREAM project. We also review recent time/memory/data and time/memory/key trade-offs relevant for the generic attacks on stream ciphers.
متن کاملQuantifying Latency and Throughput Compromises in CMP Design
Designers of chip multiprocessors will increasingly be called upon to optimize for a combination of design metrics under a variety of design constraints. The adoption of chip multiprocessors has also led to a shift in design metrics toward aggregate throughput and away from single thread latency. We examine the compromises between latency and throughput under various power, thermal, area, and b...
متن کاملMemory Subsystem Design for Multithreaded Processors
Multithreading processors pose new challenges and new opportunities for cache/memory hierarchy design. Multithreading significantly alters the data reference stream seen by the memory subsystem. Multithreading also demands very different performance characteristics from the cache hierarchy than a typical (uniprocessor) CPU. This paper is specifically concerned with memory hierarchy design consi...
متن کاملPrivApprox: Privacy-Preserving Stream Analytics
How to preserve users’ privacy while supporting high-utility analytics for low-latency stream processing? To answer this question: we describe the design, implementation and evaluation of PRIVAPPROX, a data analytics system for privacy-preserving stream processing. PRIVAPPROX provides three important properties: (i) Privacy: zero-knowledge privacy guarantee for users, a privacy bound tighter th...
متن کاملExplorer PrivApprox : Privacy - Preserving Stream Analytics
How to preserve users’ privacy while supporting high-utility analytics for low-latency stream processing? To answer this question: we describe the design, implementation and evaluation of PRIVAPPROX, a data analytics system for privacy-preserving stream processing. PRIVAPPROX provides three important properties: (i) Privacy: zero-knowledge privacy guarantee for users, a privacy bound tighter th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014